Som-based Clustering of Textual Documents Using Wordnet

نویسندگان

  • Abdelmalek AMINE
  • Zakaria ELBERRICHI
  • Michel SIMONET
  • Mimoun MALKI
چکیده

The classification of textual documents has been the subject of many studies. Technologies like the web and numerical libraries facilitated the exponential growth of available documentation. The classification of textual documents is very important since it allows the users to effectively and quickly fly over and understand better the contents of large corpora. Most classification approaches use the supervised method of training, more suitable with small corpora and when human experts are available to generate the best classes of data for the training phase, which is not always feasible. The non supervised classification or “clustering” methods make emerge latent (hidden) classes automatically with minimum human intervention, There are many, and the SOM (self Organized Maps) by kohonen is one the algorithms for non-supervised classification that gather a certain number of similar objects in groups without a priori knowledge. In this paper, we propose non supervised classification of textual document using the SOM of kohonen and based on a conceptual representation of the texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of text clustering methods using wordnet

The increasing number of digitized texts presently available notably on the Web has developed an acute need in text mining techniques. Clustering systems are used more and more often in text mining, especially to analyze texts and to extract knowledge they contain. With the availability of the vast amount of clustering algorithms and techniques, it becomes highly confusing to a user to choose t...

متن کامل

Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents

This paper presents a method using hybridization of self organizing map (SOM ), particle swarm optimization(PSO) and k-means clustering algorithm for document clustering. Document representation is an important step for clustering purposes. The common way of represent a text is bag of words approach. This approach is simple but has two drawbacks viz. synonymy and polysemy which arise because of...

متن کامل

Exploration of textual document archives using a fuzzy hierarchical clustering algorithm in the GAMBAL system

The Internet, together with the large amount of textual information available in document archives, has increased the relevance of information retrieval related tools. In this work we present an extension of the Gambal system for clustering and visualization of documents based on fuzzy clustering techniques. The tool allows to structure the set of documents in a hierarchical way (using a fuzzy ...

متن کامل

Representation of textual documents by the approach wordnet and n-grams for the unsupervised classification (clustering) with 2D cellular automata: a comparative study

In this article we present a 2D cellular automaton (Class_AC) to solve a problem of text mining in the case of unsupervised classification (clustering). Before to experiment the cellular automaton, we vectorized our data indexing textual documents from the database REUTERS 21,578 by Wordnet approach and the representation of text documents by the method n-grams. Our work is to make a comparativ...

متن کامل

Concept Chain Based Text Clustering

Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic relations between words are ignored. In this paper, we propose a novel document representation approach to strengthen the discriminative feature of document objects. We replace terms of documents with concepts in WordNet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007